This data analysis tries to shed light on a few controversial questions regarding the COVID19 epidemics and the governmental restrictions on freedom of movement and association. It is based on demographic data instead of the usual COVID19 mortality data reported daily by the mainstream press.
My approach is to compare the number of deaths in 2020 (for weeks for which mortality data is available) with the number of deaths that would be expected based on the structure of the population on January 1st, 2020 and on the usual death rates. The difference between expectations and reality gives us a metric named excess deaths when it is positive, or deaths deficit when it is negative.
Unlike COVID19 mortality data, demographic data are:
This analysis include data from the following countries:
| Country |
|---|
| Belgium |
| Bulgaria |
| Croatia |
| Czechia |
| Denmark |
| Estonia |
| Finland |
| France |
| Germany |
| Greece |
| Hungary |
| Iceland |
| Italy |
| Lithuania |
| Luxembourg |
| Netherlands |
| Poland |
| Slovakia |
| Slovenia |
| Spain |
| Sweden |
| Switzerland |
The graphs have been scaled so that they can be compared between countries. The scale has been normalized based on the number of inhabitants aged 65 or more. However, any comparison between should still be taken with care. The most significant different for most graphs is that the availability of data for the lastest weeks vary between countries.
This article, as well as all R scripts that have been used to compute the data, are hosted on GitHub at https://github.com/gfraiteur/mortality. If you have a question or remark related to this article, or if you have found a bug or inaccuracy, please open an issue on GitHub or, better, submit a pull request.
All data comes from open sources and can be freely downloaded.
Population, death and death rates are sourced from the Human Mortality Database. The demography R package is usedwhere possible, otherwise the data are downloaded from the CSV file.
COVID19 mortality is downloaded from Our World in Data.
Mobility data provided by Google COVID-19 Community Mobility Reports.
Government restriction data are from the Oxford Covid-19 Government Response Tracker.
Before we start analyzing excess mortality, it is interesting to visualize the structure of the population.
Our model of yearly mortality has two inputs:
The structure of the population the 1st of January of each year between 2009 and 2020 (i.e. the number of residents of a given age and sex alive on that day).
The death rates for the corresponding age, sex and year, i.e. the probability that a person who was alive on January 1st morning would be dead on December 31st evening.
The predictive model is then built as follows:
The expectation of yearly death count is the product of the number of inhabitants by the death rate for the given age and sex. This data aggregated by 5-year age groups.
A histogram of distribution of the yearly mortality by week of year is computed by averaging the last years.
The weekly mortality model is built by multiplying the yearly mortality model by the weekly distribution model.
This process is described here below.
The first input of our model is the structure of the population. The data set gives us the number of inhabitants of each sex who are alive and have a specific age on January 1st of a given year.
When the population structure is not available until 2020, we use linear regressions, for each sex and age, to complete the missing years. The coefficients are computed based on the last 5 years for which data are available. Typically only the last 1 or 2 years of data are missing, therefore a linear extrapolation (as opposed to the use of a more complex model such as Lee-Carter) is considered sufficient.
(Note that we also use this linear regression to project the population structure to 2021, which is incorrect by up to 10% because of the excess mortality in 2020. Excess mortality in 2021 can therefore be overestimated by up to 10% according to countries.)
The second input is the historical death rates for each age and sex. When a data point is missing for a given year, it is interpolated from the past and previous year for the given age and sex.
The death rates are typically not known for 2019 and 2020, and in some cases for a few more past years.
We model the death rate with a linear regression for each age and sex. This model allows us to extrapolate the data to 2019 and 2020, Note that the model does not use the empirical death rates, but only the linear regression itself. Thus approach removes the year-to-year variations for all previous years. That is, this death rate model removes the effect of epidemics and weather conditions that happen less frequently than yearly.
Once we have a death rate model for each group and year, we multiply this coefficient by the actual (or extrapolated) population for this age group and year, which should give us the number of expected deaths.
However, the expected and observed number of deaths, summed from 2009 to 2019, don’t match exactly. This discrepancy is expected and its cause is not important. To cancel the discrepancy, we compute correction factors and apply them, for each sex and age group, so that the 10-year total matches exactly.
The next graphs shows the yearly mortality model and compares it to empirical data:
We now have a yearly model, but we need weekly projections. For our weekly model, we first compute, for each sex and age group, the percent of deaths that happens in a given week of the year, and we multiply this coefficient with the yearly death rate for this year. Note that we applied a 3-week centered rolling mean to the data series before aggregating per week of year.
To get the weekly mortality, we take the yearly mortality, the population structure as of January 1st of the year, and we multiply, for each age group and sex, by the week pattern.
Now that we have a predictive model, we can compare the actual mortality with the one that would be expected in a “normal” (neither good, neither bad) year.
Where possible, the data is shown from 2010 to make it possible to compare the mortality peaks of 2020 with those of recent years.
The following graphs are identical but focus on 2020:
It is also interesting to look at cumulative excess mortality over a long period of time. In most countries, there is a succession of one of two good years followed by one or two bad years. These graphs allow us to visualize how, and how fast, good years compensate the bad ones, and to compare 2020 to previous bad years.
In most epidemics and other events, the most vulnerable people tend to be the most affected. Some of these people would probably have died some time later of another cause. This phenomenon, when it exists, is visible on cumulative excess mortality graphs: the steepest is the slope down after the epidemic peak, the less the lifetime of people was actually shortened by the event.
The excess mortality rate in 2020 is risk to which a person in given age group and sex was exposed compared to the expected rate if the year was “normal”. For instance, a 2% mortality rate means that a person in that group had 2 out of 100 more “chances” to die in 2020 than in a normal year.
Note that the excess mortality rate for 2020 is computed from incomplete data.
The following graphs shows the excess mortality rate in 2020:
The following graph compares the mortality rate in 2020 with the expected mortality rate. Since data from 2020 is incomplete, this rate is computed as being the expected yearly mortality rate for the whole 2020, plus the excess mortality rate computed for the period where data is available.